Search CORE

2,203 research outputs found

Screening for a Reweighted Penalized Conditional Gradient Method

Author: Bach Francis
Sun Yifan
Publication venue
Publication date: 02/07/2021
Field of study

The conditional gradient method (CGM) is widely used in large-scale sparse convex optimization, having a low per iteration computational cost for structured sparse regularizers and a greedy approach to collecting nonzeros. We explore the sparsity acquiring properties of a general penalized CGM (P-CGM) for convex regularizers and a reweighted penalized CGM (RP-CGM) for nonconvex regularizers, replacing the usual convex constraints with gauge-inspired penalties. This generalization does not increase the per-iteration complexity noticeably. Without assuming bounded iterates or using line search, we show

O(1/t)

convergence of the gap of each subproblem, which measures distance to a stationary point. We couple this with a screening rule which is safe in the convex case, converging to the true support at a rate

O(1/(\delta^2))

where

\delta \geq 0

measures how close the problem is to degeneracy. In the nonconvex case the screening rule converges to the true support in a finite number of iterations, but is not necessarily safe in the intermediate iterates. In our experiments, we verify the consistency of the method and adjust the aggressiveness of the screening rule by tuning the concavity of the regularizer

arXiv.org e-Print Archive

Open Journal of Mathematical Optimization

Reducing Discretization Error in the Frank-Wolfe Method

Author: Chen Zhaoyue
Sun Yifan
Publication venue
Publication date: 13/04/2023
Field of study

The Frank-Wolfe algorithm is a popular method in structurally constrained machine learning applications, due to its fast per-iteration complexity. However, one major limitation of the method is a slow rate of convergence that is difficult to accelerate due to erratic, zig-zagging step directions, even asymptotically close to the solution. We view this as an artifact of discretization; that is to say, the Frank-Wolfe \emph{flow}, which is its trajectory at asymptotically small step sizes, does not zig-zag, and reducing discretization error will go hand-in-hand in producing a more stabilized method, with better convergence properties. We propose two improvements: a multistep Frank-Wolfe method that directly applies optimized higher-order discretization schemes; and an LMO-averaging scheme with reduced discretization error, and whose local convergence rate over general convex sets accelerates from a rate of

O(1/k)

to up to

O(1/k^{3/2})

.Comment: The 26th International Conference on Artificial Intelligence and Statistics (AISTATS) 2023. arXiv admin note: text overlap with arXiv:2205.1179

arXiv.org e-Print Archive